Visualization, Search and Analysis of Hierarchical Translation Equivalence in Machine Translation Data
نویسندگان
چکیده
Translation equivalence constitutes the basis of all Machine Translation systems including the recent hierarchical and syntax-based systems. For hierarchical MT research it is important to have a tool that supports the qualitative and quantitative analysis of hierarchical translation equivalence relations extracted from word alignments in data. In this paper we present such a toolkit and exemplify some of its uses. The main challenges taken up in designing this tool are the efficient and compact, yet complete, representation of hierarchical translation equivalence coupled with an intuitive visualization of these hierarchical relations. We exploit a new hierarchical representation, called Hierarchical Alignment Trees (HATs), which is based on an extension of the algorithms used for factorizing n-ary branching SCFG rules into their minimally-branching equivalents. Our toolkit further provides a search capability based on hierarchically relevant properties of word alignments and/or translation equivalence relations. Finally, the tool allows detailed statistical analysis of word alignments, thereby providing a breakdown of alignment statistics according to the complexity of translation equivalence units or reordering phenomena. We illustrate this with an empirical study of the coverage of inversion-transduction grammars for a number of corpora enriched with manual or automatic word alignments, followed by a breakdown of corpus statistics to reordering complexity.
منابع مشابه
بهبود و توسعه یک سیستم مترجمیار انگلیسی به فارسی
In recent years, significant improvements have been achieved in statistical machine translation (SMT), but still even the best machine translation technology is far from replacing or even competing with human translators. Another way to increase the productivity of the translation process is computer-assisted translation (CAT) system. In a CAT system, the human translator begins to type the tra...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کاملSystemic Functional Linguistics as a Tool of Text Analysis for Translation
Translation, ipso facto, is an understanding and a transferal of meaning from one language into another. Therefore, it may be fitting to conclude that a suitable semantic theory should underpin any attempt to that end. This paper advocates implementing Systemic Functional Linguistics (henceforth SFL) which subscribes to a view of language as a "meaning-potential". In fact, Halliday and Matthies...
متن کاملLoss of the Socio-cultural Implicit Meanings in the English Translations of Mu’allaqat
Abstract Translation of literary texts, especially poetry, is one of the most difficult tasks; it requires mastery and knowledge of the language system and culture, and lack of this might lead to wrong translation. This study aimed to examine the loss and gain of the sociocultural implicit meanings in the English translations of the Mu’allaqat, and assess whether the translators of the Mu’allaq...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Prague Bull. Math. Linguistics
دوره 101 شماره
صفحات -
تاریخ انتشار 2014